.. _fixed_effects: Fixed Effects Estimation *************************** .. |pid ppfadl| raw:: html "pid" .. |pid ppfadl2| raw:: latex \href{https://paneldata.org/soep-core/data/ppathl/pid}{\textbf{"pid"}} .. |syear ppfadl| raw:: html "syear" .. |syear ppfadl2| raw:: latex \href{https://paneldata.org/soep-core/data/ppathl/syear}{\textbf{"syear"}} .. |netto ppfadl| raw:: html "netto" .. |netto ppfadl2| raw:: latex \href{https://paneldata.org/soep-core/data/ppathl/netto}{\textbf{"netto"}} .. |phrf ppfadl| raw:: html "phrf" .. |phrf ppfadl2| raw:: latex \href{https://paneldata.org/soep-core/data/ppathl/phrf}{\textbf{"phrf"}} .. |sex ppfadl| raw:: html "sex" .. |sex ppfadl2| raw:: latex \href{https://paneldata.org/soep-core/data/ppathl/sex}{\textbf{"sex"}} .. |pgfamstd pgen| raw:: html "pgfamstd" .. |pgfamstd pgen2| raw:: latex \href{https://paneldata.org/soep-core/data/pgen/pgfamstd}{\textbf{"pgfamstd"}} .. |gebjahr ppfadl| raw:: html "gebjahr" .. |gebjahr ppfadl2| raw:: latex \href{https://paneldata.org/soep-core/data/ppathl/gebjahr}{\textbf{"gebjahr"}} .. |pop ppfadl| raw:: html "pop" .. |pop ppfadl2| raw:: latex \href{https://paneldata.org/soep-core/data/ppathl/pop}{\textbf{"pop"}} .. |pglabgro pgen| raw:: html "pglabgro" .. |pglabgro pgen2| raw:: latex \href{https://paneldata.org/soep-core/data/pgen/pglabgro}{\textbf{"pglabgro"}} .. |pgtatzeit pgen| raw:: html "pgtatzeit" .. |pgtatzeit pgen2| raw:: latex \href{https://paneldata.org/soep-core/data/pgen/pglabgro}{\textbf{"pgtatzeit"}} .. |pgexpft pgen| raw:: html "pgexpft" .. |pgexpft pgen2| raw:: latex \href{https://paneldata.org/soep-core/data/pgen/pgexpft}{\textbf{"pgexpft"}} .. |pgbilzeit pgen| raw:: html "pgbilzeit" .. |pgbilzeit pgen2| raw:: latex \href{https://paneldata.org/soep-core/data/pgen/pgbilzeit}{\textbf{"pgbilzeit"}} Let's say you want to find out whether certain variables relevant to the labor market, such as work experience or time in education, influence a person's hourly wage. Other variables such as gender or marital status should also be taken into account. You decide to use the SOEP data to set up a fixed effects estimation model. **Create a path with four subfolders:** .. figure:: png/uebungspfade.png :align: center **Example:** - H:/material/exercises/do - H:/material/exercises/output - H:/material/exercises/temp - H:/material/exercises/log These are used to store your script, log files, datasets, and temporary datasets. Open an empty do-file and define your paths with globals: .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 12-20 The global "AVZ" defines the main path. The main paths are subdivided using the globals "MY_IN_PATH", "MY_DO_FILES", "MY_LOG_OUT", "MY_OUT_DATA", "MY_OUT_TEMP". The global "MY_IN_PATH" contains the path to your data. **a) Generate your own SOEPwage.dta dataset. The dataset should contain information on gross monthly wage, marital status, and other personal characteristics.** To perform your analysis, you need different SOEP variables. The SOEP offers various options for a variable search: - Search the questionnaires for useful variables. (For more information, see the section :ref:`quest_search`) - Find a suitable variable in the topic list at paneldata.org (for more information, see the section :ref:`topic`) - Search for a suitable variable using a search term in paneldata.org (for more information, see the section :ref:`var_search`) - Use the documentation provided for the generated variables (for more information, seethe section :ref:`documentation`) Use the various important variables of the ppfadl.dta dataset as your start file. Your source file should contain the following variables: - Individual identifier |pid ppfadl| |pid ppfadl2| - Survey year |syear ppfadl| |syear ppfadl2| - Birth Year |gebjahr ppfadl| |gebjahr ppfadl2| - The net variable with information on the interview type |netto ppfadl| |netto ppfadl2| - The weighting variable |phrf ppfadl| |phrf ppfadl2| - The gender of the person |sex ppfadl| |sex ppfadl2| - Sample membership |pop ppfadl| |pop ppfadl2| .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 22 .. Attention:: Please note that since version 34 (v34), PPFADL has been renamed PPATHL. The following ecxercises are done with version 33.1 (v33.1), where the tracking file was named PPFADL. Apply the necessary content variables to your starting dataset. You need the following variables for your analysis: - Employment status `plb0022_h `_ - Current gross income in euros |pglabgro pgen| |pglabgro pgen2| - Actual weekly working hours |pgtatzeit pgen| |pgtatzeit pgen2| - Full-time work experience |pgexpft pgen| |pgexpft pgen2| - Years of education or training |pgbilzeit pgen| |pgbilzeit pgen2| - Marital status in survey year |pgfamstd pgen| |pgfamstd pgen2| .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 24-25 Only keep people who have completed an interview and who live in a private household. .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 27-31 Since you are only interested in the period from 2012 to 2016, remove all survey information that does not fall within this period. To finish, save your dataset. .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 33-34 **Exercise 1: Prepare your dataset** **a) Load your created SOEPWage.dta dataset. It contains information on gross monthly wage, marital status, and other personal characteristics.** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 38-40 **b) Recode all missing values in systemmissings (.)** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 42-43 For more information about the missing codes for SOEP data, see the chapter :ref:`missings` **c) Generate the variables "hourly wage" (gross monthly wage/4.33*working time) for persons who have earned at least 1 euro and have worked at least one hour, "Married vs. Unmarried" and age.** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 45-51 **d) Adjust the variable "hourly wage" from outlier values by setting values smaller than the first percentile to the same value. Set values greater than 3 times the 99th percentile to 3*99th percentile. Then generate the variable lwage = log(wage).** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 53-61 **Exercise 2: Descriptive statistics** **a) Define the dataset as a panel dataset.** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 63-65 **b) What percentage of people participated in all five waves (xtdescribe)** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 67-68 .. figure:: png/fixed_01.png :align: center 42808 respondents have contributed information within waves bc (2012) - bg (2016) and about 40% (17069) of the 42808 respondents have provided information for all waves. **c) Describe the variable "Married" with xttab and xttrans. Take a look at some individual wage (pid=30320901, pid=30932501, pid==3101602, pid==3101801) developments with xtline.** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 70-72 .. figure:: png/fixed_02.png :align: center You can observe 41.37 percent of person-year observations with "married==no". Within the period from 2012 to 2016, 19717 people responded at least once that they were not married. During the same period, 25014 persons reported at least once that they were married. Those who were not married for at least one year responded with "married==no" in 94.69% of the observations, whereas those who were married at least once responded in 95.88 percent of the observations with "married==yes". This indicates very stable response behavior. .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 82-83 .. figure:: png/fixed_03.png :align: center 96.87 percent of the person-year observations with "married==no" are still not married in the next period. 98.51 percent of the persons who are married indicate that they will also be married in the following period. This is evidence of stable response behavior. .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 89-90 .. figure:: png/fixed_04.png :align: center The graphic shows a comparison of the hourly wage for four different respondents. **Exercise 3: Pooled OLS Regression** **a) Execute a pooled OLS regression with "log hourly wage" as dependent variable and "married", "gender", "work experience" and "training time" as independent variables. Interpret the coefficients for "married", "gender" and "length of training". Why are these not causal effects?** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 92-94 .. figure:: png/fixed_05.png :align: center The variables married, sex, and pgbilzeit most likely correlate with other disregarded/unobserved variables that have an effect on the wage. For example, women more often work in occupations with lower wages. **b) Run the regression again with the option "vce(cluster persnr)" to get clustered standard errors. How do the standard errors of the coefficients change?** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 101-102 .. figure:: png/fixed_06.png :align: center The standard errors are getting bigger. **Exercise 4: Fixed Effects** **a) Subtract the person-specific mean value from each variable of the model. Use the "egen" function. Ideally you should also use a loop.** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 105-116 **b) Estimate the fixed effects model with the previously generated variables. Why isn't a coefficient estimated for "gender"? How do the coefficients change compared to the pooled OLS estimate? Is the effect of "married" now causally interpretable?** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 120 .. figure:: png/fixed_07.png :align: center No coefficient was estimated for gender because gender was stable over time for all observations. The coefficient of married is now significant at the 5% level! **c) Now estimate the fixed effects model using the command "xtreg lwage married sex pgexpft pgbilzeit, fe". What do you notice about the coefficients compared to task 4 b)? And with the standard errors?** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 128-129 .. figure:: png/fixed_08.png :align: center The coefficients are not identical to 4 b) and the standard errors become larger because model b) does not take into account the estimation of mean values in the standard errors. **d) Now add dummy variables for the years (i.syear). What happens to the effect of "labor market experience"?** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 135-136 .. figure:: png/fixed_09.png :align: center Effects on the variables remain significant. The model could possibly be specified on a case-by-case basis. The Mincer equation is based on (potential) labor market experience squared. **e) Now you can also square labor market experience into the model. To what extent does the effect of labor market experience change compared to task 5d)?** .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 140-141 .. figure:: png/fixed_10.png :align: center The coefficients of pgexpft and pgexpft^2 remain significant, whereas the coefficient for married is no longer significant. .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 142 .. figure:: png/fixed_11.png :align: center The graph shows that the effects of the labor market experience decrease after approximately 15 years of professional experience. **f) Now estimate the model from task 5e) with longitudinal section weights. Why is the number of cases now significantly smaller? Why could the coefficient of "pgbilzeit" have changed?** .. TIP:: Create your own longitudinal person weights, e.g., longitudinal person weight from wave A to wave D. Take the starting wave cross-sectional weight (aphrf) and multiply through by each following wave staying factor, as in the following example: gen adphrf=aphrf*bpbleib*cpbleib*dpbleib Since you are looking at the period 2012-2016, you must create a suitable longitudinal weight. To do this, use the phrf dataset from the RAW subdirectory. Apply the required variables to your analysis dataset and generate your period-related longitudinal section weight. To understand the structure of the data distribution file and the location of the different datasets, visit the section :ref:`datasets`. For more information about the weighting datasets and other survey datasets, see the section :ref:`survey`. .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 145-151 Now estimate the model from 5e) and use the created weight. .. literalinclude:: docs/Fixed_Effects.do :linenos: :lines: 153 .. figure:: png/fixed_12.png :align: center The number of observations is now much smaller. The effect of pgbilzeit is greater than before. Pgbilzeit has a lower effect in the wlong==0 group, where the return is different for each additional educational year. People in the wlong===0 group may not get the returns on additional education they expected on the local labor market and may therefore move -> higher dropout probability. Last change: |today|